write amplification
AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles
Wang, Yuxin, He, Yuankai, Shi, Weisong
Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day. Supporting emerging third-party applications calls for a general-purpose, queryable onboard storage system. Yet today's data loggers and storage stacks in vehicles fail to deliver efficient data storage and retrieval. This paper presents AVS, an Autonomous Vehicle Storage system that co-designs computation with a hierarchical layout: modality-aware reduction and compression, hot-cold tiering with daily archival, and a lightweight metadata layer for indexing. The design is grounded with system-level benchmarks on AV data that cover SSD and HDD filesystems and embedded indexing, and is validated on embedded hardware with real L4 autonomous driving traces. The prototype delivers predictable real-time ingest, fast selective retrieval, and substantial footprint reduction under modest resource budgets. The work also outlines observations and next steps toward more scalable and longer deployments to motivate storage as a first-class component in AV stacks.
DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree
Zhuang, Zhutao, Zeng, Xinqi, Chen, Zhiguang
Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient garbage collection. To reduce interference to write throughput DumpKV conducts feature collection during L0\-L1 compaction leveraging the fact that LSM\-tree is small under KV separation. Experimental results show that DumpKV achieves lower write amplification by 38\%\-73\% compared to existing key\-value separation garbage collection LSM\-tree stores with small feature storage overhead.
Offline and Online Algorithms for SSD Management
Flash-based solid-state drives (SSDs) are a key component in most computer systems, thanks to their ability to support parallel I/O at sub-millisecond latency and consistently high throughput. At the same time, due to the limitations of the flash media, they perform writes out-of-place, often incurring a high internal overhead which is referred to as write amplification. Minimizing this overhead has been the focus of numerous studies by the systems research community for more than two decades. The abundance of system-level optimizations for reducing SSD write amplification, which is typically based on experimental evaluation, stands in stark contrast to the lack of theoretical algorithmic results in this problem domain. To bridge this gap, we explore the problem of reducing write amplification from an algorithmic perspective, considering it in both offline and online settings.